[アップデート] CloudFormation が AWS Parallel Computing Service をサポートしました

[アップデート] CloudFormation が AWS Parallel Computing Service をサポートしました

Clock Icon2024.12.31

こんいちは!AWS 事業本部コンサルティング部のたかくに(@takakuni_)です。

CloudFormation が AWS Parallel Computing Service をサポートしました。

https://aws.amazon.com/jp/about-aws/whats-new/2024/12/cloudformation-aws-parallel-computing-service/

2024 年夏頃に突如現れた AWS Parallel Computing Service を IaC で表現できるようになり、検証スピードが捗りますね。(私のブログスピードも早まりそうです。)

https://dev.classmethod.jp/articles/announcing-aws-parallel-computing-service/

アップデート内容

今回 CloudForamtion で以下のリソースをサポートしました。

  • AWS::PCS::Cluster
  • AWS::PCS::ComputeNodeGroup
  • AWS::PCS::Queue

早速、リソースを作っていきましょう。今回は Getting Started に従いリソースを作っていきます。

image.png

https://docs.aws.amazon.com/pcs/latest/userguide/getting-started.html

VPC

かなり省略しますが、 Getting Started では VPC とセキュリティグループは、 PCS は CloudFormation コードが用意されていました。

今回はこちらを流用させていただき、リソースを作ります。

main.yaml(VPC 部分)

こちらの手順に従いました。(スタック名は pcs-blog にしました。)

https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_create-vpc.html

main.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Create public and private subnets in two or three AZs. Specified CIDR blocks allow 4096 IPs each.

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      - Label:
          default: VPC
        Parameters:
          - CidrBlock
      - Label:
          default: Subnets A
        Parameters:
          - CidrPublicSubnetA
          - CidrPrivateSubnetA
      - Label:
          default: Subnets B
        Parameters:
          - CidrPublicSubnetB
          - CidrPrivateSubnetB
      - Label:
          default: Subnets C
        Parameters:
          - ProvisionSubnetsC
          - CidrPublicSubnetC
          - CidrPrivateSubnetC

Parameters:
  CidrBlock:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.0.0/16
    Description: VPC CIDR Block (eg 10.3.0.0/16)
    Type: String
  CidrPublicSubnetA:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.0.0/20
    Description: VPC CIDR Block for the Public Subnet A
    Type: String
  CidrPublicSubnetB:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.16.0/20
    Description: VPC CIDR Block for the Public Subnet B
    Type: String
  CidrPublicSubnetC:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.32.0/20
    Description: VPC CIDR Block for the Public Subnet C
    Type: String
  CidrPrivateSubnetA:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.128.0/20
    Description: VPC CIDR Block for the Private Subnet A
    Type: String
  CidrPrivateSubnetB:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.144.0/20
    Description: VPC CIDR Block for the Private Subnet B
    Type: String
  CidrPrivateSubnetC:
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: 10.3.160.0/20
    Description: VPC CIDR Block for the Private Subnet C
    Type: String
  ProvisionSubnetsC:
    Type: String
    Description: Provision optional 3rd set of subnets
    Default: 'True'
    AllowedValues:
      - 'True'
      - 'False'

Mappings:
  RegionMap:
    us-east-1:
      ZoneId1: use1-az6
      ZoneId2: use1-az4
      ZoneId3: use1-az5
    us-east-2:
      ZoneId1: use2-az2
      ZoneId2: use2-az3
      ZoneId3: use2-az1
    us-west-1:
      ZoneId1: usw1-az1
      ZoneId2: usw1-az3
      ZoneId3: usw1-az2
    us-west-2:
      ZoneId1: usw2-az1
      ZoneId2: usw2-az2
      ZoneId3: usw2-az3
    eu-central-1:
      ZoneId1: euc1-az3
      ZoneId2: euc1-az2
      ZoneId3: euc1-az1
    eu-west-1:
      ZoneId1: euw1-az1
      ZoneId2: euw1-az2
      ZoneId3: euw1-az3
    eu-west-2:
      ZoneId1: euw2-az2
      ZoneId2: euw2-az3
      ZoneId3: euw2-az1
    eu-west-3:
      ZoneId1: euw3-az1
      ZoneId2: euw3-az2
      ZoneId3: euw3-az3
    eu-north-1:
      ZoneId1: eun1-az2
      ZoneId2: eun1-az1
      ZoneId3: eun1-az3
    ca-central-1:
      ZoneId1: cac1-az2
      ZoneId2: cac1-az1
      ZoneId3: cac1-az3
    eu-south-1:
      ZoneId1: eus1-az2
      ZoneId2: eus1-az1
      ZoneId3: eus1-az3
    ap-east-1:
      ZoneId1: ape1-az3
      ZoneId2: ape1-az2
      ZoneId3: ape1-az1
    ap-northeast-1:
      ZoneId1: apne1-az4
      ZoneId2: apne1-az1
      ZoneId3: apne1-az2
    ap-northeast-2:
      ZoneId1: apne2-az1
      ZoneId2: apne2-az3
      ZoneId3: apne2-az2
    ap-south-1:
      ZoneId1: aps1-az2
      ZoneId2: aps1-az3
      ZoneId3: aps1-az1
    ap-southeast-1:
      ZoneId1: apse1-az1
      ZoneId2: apse1-az2
      ZoneId3: apse1-az3
    ap-southeast-2:
      ZoneId1: apse2-az3
      ZoneId2: apse2-az1
      ZoneId3: apse2-az2
    us-gov-west-1:
      ZoneId1: usgw1-az1
      ZoneId2: usgw1-az2
      ZoneId3: usgw1-az3
    ap-northeast-3:
      ZoneId1: apne3-az3
      ZoneId2: apne3-az2
      ZoneId3: apne3-az1
    sa-east-1:
      ZoneId1: sae1-az3
      ZoneId2: sae1-az2
      ZoneId3: sae1-az1
    af-south-1:
      ZoneId1: afs1-az3
      ZoneId2: afs1-az2
      ZoneId3: afs1-az1
    ap-south-2:
      ZoneId1: aps2-az3
      ZoneId2: aps2-az2
      ZoneId3: aps2-az1
    ap-southeast-3:
      ZoneId1: apse3-az3
      ZoneId2: apse3-az2
      ZoneId3: apse3-az1
    ap-southeast-4:
      ZoneId1: apse4-az3
      ZoneId2: apse4-az2
      ZoneId3: apse4-az1
    ca-west-1:
      ZoneId1: caw1-az3
      ZoneId2: caw1-az2
      ZoneId3: caw1-az1
    eu-central-2:
      ZoneId1: euc2-az3
      ZoneId2: euc2-az2
      ZoneId3: euc2-az1
    eu-south-2:
      ZoneId1: eus2-az3
      ZoneId2: eus2-az2
      ZoneId3: eus2-az1
    il-central-1:
      ZoneId1: ilc1-az3
      ZoneId2: ilc1-az2
      ZoneId3: ilc1-az1
    me-central-1:
      ZoneId1: mec1-az3
      ZoneId2: mec1-az2
      ZoneId3: mec1-az1

Conditions:
  DoProvisionSubnetsC: !Equals [!Ref ProvisionSubnetsC, 'True']

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: !Ref CidrBlock
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: 'Name'
          Value: !Sub '${AWS::StackName}:Large-Scale-HPC'

  VPCFlowLog:
    Type: AWS::EC2::FlowLog
    Properties:
      ResourceId: !Ref VPC
      ResourceType: VPC
      TrafficType: ALL
      LogDestinationType: cloud-watch-logs
      LogGroupName: !Sub '${AWS::StackName}-VPCFlowLogs'
      DeliverLogsPermissionArn: !GetAtt FlowLogRole.Arn

  FlowLogRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - vpc-flow-logs.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      ManagedPolicyArns:
        - !Ref AWS::NoValue
      Policies:
        - PolicyName: FlowLogPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'logs:CreateLogGroup'
                  - 'logs:CreateLogStream'
                  - 'logs:PutLogEvents'
                  - 'logs:DescribeLogGroups'
                  - 'logs:DescribeLogStreams'
                Resource: !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:${AWS::StackName}-VPCFlowLogs:*'

  PublicSubnetA:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Ref CidrPublicSubnetA
      AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PublicSubnetA-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName

  PublicSubnetB:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Ref CidrPublicSubnetB
      AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PublicSubnetB-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName

  PublicSubnetC:
    Type: AWS::EC2::Subnet
    Condition: DoProvisionSubnetsC
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Ref CidrPublicSubnetC
      AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PublicSubnetC-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName

  InternetGateway:
    Type: AWS::EC2::InternetGateway

  AttachGateway:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref VPC
      InternetGatewayId: !Ref InternetGateway

  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: !Sub '${AWS::StackName}:PublicRoute'
  PublicRoute1:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  PublicSubnetARouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetA
      RouteTableId: !Ref PublicRouteTable

  PublicSubnetBRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetB
      RouteTableId: !Ref PublicRouteTable

  PublicSubnetCRouteTableAssociation:
    Condition: DoProvisionSubnetsC
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetC
      RouteTableId: !Ref PublicRouteTable

  PrivateSubnetA:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
      CidrBlock: !Ref CidrPrivateSubnetA
      MapPublicIpOnLaunch: false
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PrivateSubnetA-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName

  PrivateSubnetB:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
      CidrBlock: !Ref CidrPrivateSubnetB
      MapPublicIpOnLaunch: false
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PrivateSubnetB-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName

  PrivateSubnetC:
    Type: AWS::EC2::Subnet
    Condition: DoProvisionSubnetsC
    Properties:
      VpcId: !Ref VPC
      AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
      CidrBlock: !Ref CidrPrivateSubnetC
      MapPublicIpOnLaunch: false
      Tags:
        - Key: Name
          Value: !Sub
            - '${StackName}:PrivateSubnetC-${AvailabilityZone}'
            - StackName: !Ref AWS::StackName
              AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName

  NatGatewayAEIP:
    Type: AWS::EC2::EIP
    DependsOn: AttachGateway
    Properties:
      Domain: vpc

  NatGatewayBEIP:
    Type: AWS::EC2::EIP
    DependsOn: AttachGateway
    Properties:
      Domain: vpc

  NatGatewayCEIP:
    Condition: DoProvisionSubnetsC
    Type: AWS::EC2::EIP
    DependsOn: AttachGateway
    Properties:
      Domain: vpc

  NatGatewayA:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt NatGatewayAEIP.AllocationId
      SubnetId: !Ref PublicSubnetA

  NatGatewayB:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt NatGatewayBEIP.AllocationId
      SubnetId: !Ref PublicSubnetB

  NatGatewayC:
    Type: AWS::EC2::NatGateway
    Condition: DoProvisionSubnetsC
    Properties:
      AllocationId: !GetAtt NatGatewayCEIP.AllocationId
      SubnetId: !Ref PublicSubnetC

  PrivateRouteTableA:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: !Sub '${AWS::StackName}:PrivateRouteA'

  PrivateRouteTableB:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: !Sub '${AWS::StackName}:PrivateRouteB'

  PrivateRouteTableC:
    Type: AWS::EC2::RouteTable
    Condition: DoProvisionSubnetsC
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: !Sub '${AWS::StackName}:PrivateRouteC'

  DefaultPrivateRouteA:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTableA
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGatewayA

  DefaultPrivateRouteB:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTableB
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGatewayB

  DefaultPrivateRouteC:
    Type: AWS::EC2::Route
    Condition: DoProvisionSubnetsC
    Properties:
      RouteTableId: !Ref PrivateRouteTableC
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGatewayC

  PrivateSubnetARouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref PrivateRouteTableA
      SubnetId: !Ref PrivateSubnetA

  PrivateSubnetBRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref PrivateRouteTableB
      SubnetId: !Ref PrivateSubnetB

  PrivateSubnetCRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Condition: DoProvisionSubnetsC
    Properties:
      RouteTableId: !Ref PrivateRouteTableC
      SubnetId: !Ref PrivateSubnetC

  AvailabiltyZone1:
    Type: Custom::AvailabiltyZone
    DependsOn: LogGroupGetAZLambdaFunction
    Properties:
      ServiceToken: !GetAtt GetAZLambdaFunction.Arn
      ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId1]

  AvailabiltyZone2:
    Type: Custom::AvailabiltyZone
    DependsOn: LogGroupGetAZLambdaFunction
    Properties:
      ServiceToken: !GetAtt GetAZLambdaFunction.Arn
      ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId2]

  AvailabiltyZone3:
    Type: Custom::AvailabiltyZone
    Condition: DoProvisionSubnetsC
    DependsOn: LogGroupGetAZLambdaFunction
    Properties:
      ServiceToken: !GetAtt GetAZLambdaFunction.Arn
      ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId3]

  LogGroupGetAZLambdaFunction:
    Type: AWS::Logs::LogGroup
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties:
      LogGroupName: !Sub /aws/lambda/${GetAZLambdaFunction}
      RetentionInDays: 7

  GetAZLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      Description: GetAZLambdaFunction
      Timeout: 60
      Runtime: python3.9
      Handler: index.handler
      Role: !GetAtt GetAZLambdaRole.Arn
      Code:
        ZipFile: |
          import cfnresponse
          from json import dumps
          from boto3 import client
          EC2 = client('ec2')
          def handler(event, context):
              if event['RequestType'] in ('Create', 'Update'):
                  print(dumps(event, default=str))
                  data = {}
                  try:
                      response = EC2.describe_availability_zones(
                          Filters=[{'Name': 'zone-id', 'Values': [event['ResourceProperties']['ZoneId']]}]
                      )
                      print(dumps(response, default=str))
                      data['ZoneName'] = response['AvailabilityZones'][0]['ZoneName']
                  except Exception as error:
                      cfnresponse.send(event, context, cfnresponse.FAILED, {}, reason=error)
                  finally:
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
              else:
                  cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
      Tags:
        - Key: Name
          Value: !Sub ${AWS::StackName}GetAZLambdaFunction

  GetAZLambdaRole:
    Type: AWS::IAM::Role
    Properties:
      Path: /
      Description: GetAZLambdaFunction
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - sts:AssumeRole
            Principal:
              Service:
                - !Sub 'lambda.${AWS::URLSuffix}'
      ManagedPolicyArns:
        - !Sub 'arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
      Policies:
        - PolicyName: GetAZLambdaFunction
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Sid: ec2
                Effect: Allow
                Action:
                  - ec2:DescribeAvailabilityZones
                Resource:
                  - '*'
      Tags:
        - Key: Name
          Value: !Sub ${AWS::StackName}-GetAZLambdaFunction

  S3Endpoint:
    Type: 'AWS::EC2::VPCEndpoint'
    Properties:
      VpcEndpointType: 'Gateway'
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
      RouteTableIds:
        - !Ref PublicRouteTable
        - !Ref PrivateRouteTableA
        - !Ref PrivateRouteTableB
      VpcId: !Ref VPC

  SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all traffic from resources in VPC
      VpcId:
        Ref: VPC
      SecurityGroupIngress:
        - IpProtocol: -1
          CidrIp: !Ref CidrBlock
      SecurityGroupEgress:
        - IpProtocol: -1
          CidrIp: !Ref CidrBlock

Outputs:
  VPC:
    Value: !Ref VPC
    Description: ID of the VPC
    Export:
      Name: !Sub ${AWS::StackName}-VPC
  PublicSubnets:
    Value: !Join
      - ','
      - - !Ref PublicSubnetA
        - !Ref PublicSubnetB
        - !If
          - DoProvisionSubnetsC
          - !Ref PublicSubnetC
          - !Ref AWS::NoValue
    Description: ID of the public subnets
    Export:
      Name: !Sub ${AWS::StackName}-PublicSubnets
  PrivateSubnets:
    Value: !Join
      - ','
      - - !Ref PrivateSubnetA
        - !Ref PrivateSubnetB
        - !If
          - DoProvisionSubnetsC
          - !Ref PrivateSubnetC
          - !Ref AWS::NoValue
    Description: ID of the private subnets
    Export:
      Name: !Sub ${AWS::StackName}-PrivateSubnets
  DefaultPrivateSubnet:
    Description: The ID of a default private subnet
    Value: !Ref PrivateSubnetA
    Export:
      Name: !Sub '${AWS::StackName}-DefaultPrivateSubnet'
  DefaultPublicSubnet:
    Description: The ID of a default public subnet
    Value: !Ref PublicSubnetA
    Export:
      Name: !Sub '${AWS::StackName}-DefaultPublicSubnet'
  InternetGatewayId:
    Description: The ID of the Internet Gateway
    Value: !Ref InternetGateway
    Export:
      Name: !Sub '${AWS::StackName}-InternetGateway'
  SecurityGroup:
    Description: The ID of the local security group
    Value: !Ref SecurityGroup
    Export:
      Name: !Sub '${AWS::StackName}-SecurityGroup'
pcs-cluster-sg.yaml

こちらの手順に従いました。(スタック名は pcs-blog-sg にしました。)

https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_create-sg.html

AWSTemplateFormatVersion: 2010-09-09
Description: Security group for communications between AWS PCS controller, compute nodes, and client nodes, plus optional inbound SSH security group.

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      - Label:
          default: Network
        Parameters:
          - VpcId
      - Label:
          default: Security group configuration
        Parameters:
          - CreateInboundSshSecurityGroup
          - ClientIpCidr

Parameters:
  VpcId:
    Description: VPC where the AWS PCS cluster will be deployed
    Type: 'AWS::EC2::VPC::Id'
  ClientIpCidr:
    Description: IP address(s) allowed to connect to nodes using SSH
    Default: '0.0.0.0/0'
    Type: String
    AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})
    ConstraintDescription: Value must be a valid IP or network range of the form x.x.x.x/x.
  CreateInboundSshSecurityGroup:
    Description: Create an inbound security group to allow SSH access to nodes.
    Type: String
    Default: 'True'
    AllowedValues:
      - 'True'
      - 'False'

Conditions:
  CreateSshSecGroup: !Equals [!Ref CreateInboundSshSecurityGroup, 'True']

Resources:
  ClusterSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Supports communications between AWS PCS controller, compute nodes, and client nodes
      VpcId: !Ref VpcId
      GroupName: !Sub 'cluster-${AWS::StackName}'

  ClusterAllowAllInboundFromSelf:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref ClusterSecurityGroup
      IpProtocol: '-1'
      SourceSecurityGroupId: !Ref ClusterSecurityGroup

  ClusterAllowAllOutboundToSelf:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref ClusterSecurityGroup
      IpProtocol: '-1'
      DestinationSecurityGroupId: !Ref ClusterSecurityGroup

  # This allows all outbound comms, which enables HTTPS calls and connections to networked storage
  ClusterAllowAllOutboundToWorld:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref ClusterSecurityGroup
      IpProtocol: '-1'
      CidrIp: 0.0.0.0/0

  # Attach this to login nodes to enable inbound SSH access.
  InboundSshSecurityGroup:
    Condition: CreateSshSecGroup
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allows inbound SSH access
      GroupName: !Sub 'inbound-ssh-${AWS::StackName}'
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: !Ref ClientIpCidr

Outputs:
  ClusterSecurityGroupId:
    Description: Supports communication between PCS controller, compute nodes, and login nodes
    Value: !Ref ClusterSecurityGroup
  InboundSshSecurityGroupId:
    Description: Enables SSH access to login nodes
    Value: !Ref InboundSshSecurityGroup

EFS

EFS からはコンソールでの作業であったため、温かみのある手作りでファイルシステムを作成しました。

2024-12-30 at 22.54.11-EFS  ap-northeast-1.png

https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_create-efs.html

File system ID をメモしておきます。(Lustre は今回は作成しませんでした。)

2024-12-30 at 22.55.24-Amazon EFS - ファイルシステムリスト.png

AWS::PCS::Cluster

PCS クラスターの作成を行います。ここから醍醐味ですね。

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-pcs-cluster.html

作成したセキュリティグループ ID、サブネット ID を指定します。

SlurmConfiguration は設定せずスキップします。 Tags ですが型が String となっており、List で埋め込むとエラーになるためスキップしました。

pcs-cluster.yaml
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template

Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: 'Network'
        Parameters:
          - SecurityGroupIds
          - SubnetIds

Parameters:
  SecurityGroupIds:
    Type: List<AWS::EC2::SecurityGroup::Id>
  SubnetIds:
    Type: List<AWS::EC2::Subnet::Id>

Resources:
  Cluster:
    Type: AWS::PCS::Cluster
    Properties:
      Name: !Sub 'cluster-${AWS::StackName}'
      Networking:
        SecurityGroupIds: !Ref SecurityGroupIds
        SubnetIds: !Ref SubnetIds
      Scheduler:
        Type: 'SLURM'
        Version: '24.05'
      Size: 'SMALL'
      # Tags:
      #   - Key: Name
      #     Value: !Sub '${Prefix}-cluster'

インスタンスプロファイルの作成

ノードグループの設定する IAM ロール、インスタンスプロファイルを作成します。

こちらは自分で作る必要があったため作成しました。

AmazonSSMManagedInstanceCore とログイングループへの参加権限を付与してあげます。

pcs-cluster-role.yaml
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template
Resources:
  #################################################
  # EC2 Launch Template Configuration
  #################################################
  Role:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub 'cluster-role-${AWS::StackName}'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
            Action: sts:AssumeRole
      Path: /
      Policies:
        - PolicyName: !Sub 'cluster-policy-${AWS::StackName}'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'pcs:RegisterComputeNodeGroupInstance'
                Resource: '*'
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'

  InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
        - !Ref Role

https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_create-cng_instance-profile.html

起動テンプレート

ノードグループは起動テンプレートをベースに動かすため起動テンプレートを作成します。

こちらは、テンプレートが用意されてあったためそちらを流用します。

ただし、事前にキーペアを作成する必要があったため、コンソールから作ります。

2024-12-30 at 23.12.14-キーペアを作成  EC2  ap-northeast-1.png

ユーザーデータで FES, Lustre のマウント処理が入っていますね。

AWSTemplateFormatVersion: 2010-09-09
Description: Launch templates for AWS PCS login and compute node groups, supporting shared EFS and FSx for Lustre file systems

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      - Label:
          default: Security
        Parameters:
          - VpcDefaultSecurityGroupId
          - ClusterSecurityGroupId
          - SshSecurityGroupId
          - SshKeyName
      - Label:
          default: File systems
        Parameters:
          - EfsFilesystemId
          - FSxLustreFilesystemId
          - FSxLustreFilesystemMountName

Parameters:
  VpcDefaultSecurityGroupId:
    Type: AWS::EC2::SecurityGroup::Id
    Description: Cluster VPC 'default' security group. Make sure you choose the one from your cluster VPC!
  ClusterSecurityGroupId:
    Type: AWS::EC2::SecurityGroup::Id
    Description: Security group for PCS cluster controller and nodes.
  SshSecurityGroupId:
    Type: AWS::EC2::SecurityGroup::Id
    Description: Security group for SSH into login nodes
  SshKeyName:
    Type: AWS::EC2::KeyPair::KeyName
    Description: SSH key name for access to login nodes
  EfsFilesystemId:
    Type: String
    Description: Amazon EFS file system Id
  FSxLustreFilesystemId:
    Type: String
    Description: Amazon FSx for Lustre file system Id
  FSxLustreFilesystemMountName:
    Type: String
    Description: Amazon FSx for Lustre mount name

Resources:
  LoginLaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Sub 'login-${AWS::StackName}'

      LaunchTemplateData:
        TagSpecifications:
          - ResourceType: instance
            Tags:
              - Key: HPCRecipes
                Value: 'true'
        MetadataOptions:
          HttpEndpoint: enabled
          HttpPutResponseHopLimit: 4
          HttpTokens: required
        KeyName: !Ref SshKeyName
        SecurityGroupIds:
          - !Ref ClusterSecurityGroupId
          - !Ref SshSecurityGroupId
          - !Ref VpcDefaultSecurityGroupId
        UserData:
          Fn::Base64: !Sub |
            MIME-Version: 1.0
            Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

            --==MYBOUNDARY==
            Content-Type: text/cloud-config; charset="us-ascii"
            MIME-Version: 1.0

            packages:
            - amazon-efs-utils

            runcmd:
            # Mount EFS filesystem as /home
            - mkdir -p /tmp/home
            - rsync -aA /home/ /tmp/home
            - echo "${EfsFilesystemId}:/ /home efs tls,_netdev" >> /etc/fstab
            - mount -a -t efs defaults
            - if [ "enabled" == "$(sestatus | awk '/^SELinux status:/{print $3}')" ]; then setsebool -P use_nfs_home_dirs 1; fi
            - rsync -aA --ignore-existing /tmp/home/ /home
            - rm -rf /tmp/home/
            # If provided, mount FSxL filesystem as /shared
            - if [ ! -z "${FSxLustreFilesystemId}" ]; then amazon-linux-extras install -y lustre=latest; mkdir -p /shared; chmod a+rwx /shared; mount -t lustre ${FSxLustreFilesystemId}.fsx.${AWS::Region}.amazonaws.com@tcp:/${FSxLustreFilesystemMountName} /shared; chmod 777 /shared; fi

            --==MYBOUNDARY==

  ComputeLaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Sub 'compute-${AWS::StackName}'
      LaunchTemplateData:
        TagSpecifications:
          - ResourceType: instance
            Tags:
              - Key: HPCRecipes
                Value: 'true'
        MetadataOptions:
          HttpEndpoint: enabled
          HttpPutResponseHopLimit: 4
          HttpTokens: required
        SecurityGroupIds:
          - !Ref ClusterSecurityGroupId
          - !Ref VpcDefaultSecurityGroupId
        KeyName: !Ref SshKeyName
        UserData:
          Fn::Base64: !Sub |
            MIME-Version: 1.0
            Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

            --==MYBOUNDARY==
            Content-Type: text/cloud-config; charset="us-ascii"
            MIME-Version: 1.0

            packages:
            - amazon-efs-utils

            runcmd:
            # Mount EFS filesystem as /home
            - mkdir -p /tmp/home
            - rsync -aA /home/ /tmp/home
            - echo "${EfsFilesystemId}:/ /home efs tls,_netdev" >> /etc/fstab
            - mount -a -t efs defaults
            - if [ "enabled" == "$(sestatus | awk '/^SELinux status:/{print $3}')" ]; then setsebool -P use_nfs_home_dirs 1; fi
            - rsync -aA --ignore-existing /tmp/home/ /home
            - rm -rf /tmp/home/
            # If provided, mount FSxL filesystem as /shared
            - if [ ! -z "${FSxLustreFilesystemId}" ]; then amazon-linux-extras install -y lustre=latest; mkdir -p /shared; chmod a+rwx /shared; mount -t lustre ${FSxLustreFilesystemId}.fsx.${AWS::Region}.amazonaws.com@tcp:/${FSxLustreFilesystemMountName} /shared; fi

            --==MYBOUNDARY==

Outputs:
  LoginLaunchTemplateId:
    Description: 'Login nodes template ID'
    Value: !Ref LoginLaunchTemplate
  LoginLaunchTemplateName:
    Description: 'Login nodes template name'
    Value: !Sub 'login-${AWS::StackName}'
  ComputeLaunchTemplateId:
    Description: 'Compute nodes template ID'
    Value: !Ref ComputeLaunchTemplate
  ComputeLaunchTemplateName:
    Description: 'Compute nodes template name'
    Value: !Sub 'compute-${AWS::StackName}'

ノードグループ

ログイン用のノードグループとコンピュート用のノードグループを作成します。

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-pcs-computenodegroup.html#cfn-pcs-computenodegroup-iaminstanceprofilearn

新しくできた AWS::PCS::ComputeNodeGroup を活用していきましょう。

AMI ID はサンプル AMI を利用しました。

https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_samples.html

AWSTemplateFormatVersion: 2010-09-09
Description: Sample template

Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: 'Network'
        Parameters:
          - SubnetIds
      - Label:
          default: 'PCS Cluster'
        Parameters:
          - ClusterId
      - Label:
          default: 'Login node group'
        Parameters:
          - LoginNodeInstanceProfileArn
          - LoginNodeLaunchTemplateId
          - LoginNodeSubnetIds
      - Label:
          default: 'Compute node group'
        Parameters:
          - ComputeNodeInstanceProfileArn
          - ComputeNodeLaunchTemplateId
          - ComputeNodeSubnetIds

Parameters:
  ClusterId:
    Type: String
  LoginNodeInstanceProfileArn:
    Type: String
  LoginNodeLaunchTemplateId:
    Type: String
  LoginNodeSubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
  ComputeNodeInstanceProfileArn:
    Type: String
  ComputeNodeLaunchTemplateId:
    Type: String
  ComputeNodeSubnetIds:
    Type: List<AWS::EC2::Subnet::Id>

Resources:
  LoginNodeGroup:
    Type: AWS::PCS::ComputeNodeGroup
    Properties:
      Name: !Sub 'login-node-group'
      # aws-pcs-sample_ami-amzn2-x86_64-slurm-24.05-2024-12-14T05-28-32.441Z at ap-northeast-1
      AmiId: 'ami-0e18e980afc64cc20'
      ClusterId: !Ref ClusterId
      CustomLaunchTemplate:
        Id: !Ref LoginNodeLaunchTemplateId
        Version: 1
      IamInstanceProfileArn: !Ref LoginNodeInstanceProfileArn
      InstanceConfigs:
        - InstanceType: 'c6i.xlarge'
      PurchaseOption: 'ONDEMAND'
      ScalingConfiguration:
        MaxInstanceCount: 1
        MinInstanceCount: 1
      SubnetIds: !Ref LoginNodeSubnetIds

  ComputeNodeGroup:
    Type: AWS::PCS::ComputeNodeGroup
    Properties:
      Name: !Sub 'compute-node-group'
      # aws-pcs-sample_ami-amzn2-x86_64-slurm-24.05-2024-12-14T05-28-32.441Z at ap-northeast-1
      AmiId: 'ami-0e18e980afc64cc20'
      ClusterId: !Ref ClusterId
      CustomLaunchTemplate:
        Id: !Ref ComputeNodeLaunchTemplateId
        Version: 1
      IamInstanceProfileArn: !Ref ComputeNodeInstanceProfileArn
      InstanceConfigs:
        - InstanceType: 'c6i.xlarge'
      PurchaseOption: String
      ScalingConfiguration:
        MaxInstanceCount: 4
        MinInstanceCount: 0
      SubnetIds: !Ref ComputeNodeSubnetIds

キュー

最後にキューの作成です。こちらも AWS::PCS::Queue を利用します。

AWSTemplateFormatVersion: 2010-09-09
Description: Sample template

Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: 'PCS Cluster'
        Parameters:
          - ClusterId
      - Label:
          default: 'Compute node group'
        Parameters:
          - ComputeNodeGroupId

Parameters:
  ClusterId:
    Type: String
  ComputeNodeGroupId:
    Type: String

Resources:
  Queue:
    Type: AWS::PCS::Queue
    Properties:
      Name: !Sub 'queue-${AWS::StackName}'
      ClusterId: !Ref ClusterId
      ComputeNodeGroupConfigurations:
        - ComputeNodeGroupId: !Ref ComputeNodeGroupId

うまく最後まで、できあがってそうです。

2024-12-30 at 23.57.09-パラレルコンピューティングサービス  ap-northeast-1.png

まとめ

以上、「CloudFormation が AWS Parallel Computing Service をサポートしました。」でした。

スタックをステップバイステップで作成しましたが、繋げられる部分が多いのではないかと感じました。

検証が捗りそうなアップデートで良きですね。AWS 事業本部コンサルティング部のたかくに(@takakuni_)でした!

Share this article

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.